HaiBiostat: Series 6 -- ANOVA & ANCOVA

Hai Nguyen

If we pay a bit attention to the ANOVA and ANCOVA, we can see the interesting point of views in these cases:

they greatly depend on assumptions (homoscedastic, normally distributed)
“Bayes” perhaps not call the terms ANOVA or ANCOVA. The analysis is based on linear regression’s hierarchical modeling with the restriction on some beta coefficient distributions. Thus, “Bayes” would call the metric response with nominal predictor with/without metric predictor.

One-way ANOVA

This type of analysis actually is a model with metric output and one nominal predictor.

A diagram for metric output and one nominal predictor shows parameters denoted as \(\alpha\) instead of traditional \(\beta\).

We represent the nominal predictor by a vector \(\vec{x}\) = {\(x_{1}, ..., x_{J}\)}, where \(J\) is the number of categories that the predictor has. When an individual falls in group \(j\) of the nominal predictor, this is represented by setting \(x_{j} = 1\) and \(x_{i\neq j} = 0\). The predicted value, denoted \(\mu\), is the overall baseline plus the deflection of a group:

\[\begin{align*} \mu &= \beta_0 + \sum_j \beta_{j} x_{j} \\ &= \beta_0 + \vec{\beta} \cdot \vec{x} \end{align*}\]

The overall baseline is constrained so that the deflections sum to \(0\) across the categories:
\[ \sum_j \beta_{[j]} = 0 \]

In order to satisfy this constraint, first, find unconstrained slopes \(\alpha_1, ..., \alpha_J\) and then adjust them using formulas:

\[\begin{align*} \mu &= \alpha_0 + \sum_j \alpha_{j} x_{j} \\ &= (\alpha_0 + \bar{\alpha}) + \sum_j (\alpha_{j} - \bar{\alpha}) \cdot x_{j}\\ &= \beta_0 + \sum_j \beta_{j} x_{j} \end{align*}\]

where \(\bar{\alpha} = \frac{1}{n} \sum_j \alpha_j\)

On the diagram the within-group standard deviation \(\sigma\) is given a broad uniform prior, the intercept is given a broad normal prior centered at \(0\) and the group coefficients come from zero-centered normal distribution.

Interpretation of intercept is the grand mean value.

The standard deviation of slopes \(\sigma_{\beta}\) may be either a constant or may have a prior of its own.

In case when \(\sigma_{\beta}\) = const means of all groups are estimated separately, they do not inform each other through estimation of \(\sigma_{\beta}\). A large number for \(\sigma_{\beta}\) will make the results close or equivalent to ANOVA.
Giving \(\sigma_{\beta}\) a prior affects shrinkage rate. If all group means are close to the grand mean then \(\sigma_{\beta}\) is estimated as small number. This makes shrinkage stronger.

A key assumption behind estimating \(\sigma_{\beta}\) from the data is that all groups can be meaningfully described as representatives of shared higher-level distribution.

Example. Giving \(\sigma_{\beta}\) a prior distribution is not justified if, for example, in medical studies there are several control groups and one treated group. In this case small difference between placebo groups may result in significant shrinkage (bias) of the treated group.

In this example the treated group is an outlier. Heavy-tailed prior distribution for \(\beta_j\) may help to overcome excessive shrinkage.

Possible priors for \(\sigma_{\alpha}\):

No prior, i.e. parameter is not estimated from the data. Estimated group means do not inform each other through \(\sigma_{\alpha}\)
Gamma distribution or any other distribution concentrated on positive part of real line with shape allowing large deviations from zero
Distributions on non-negative part of real line allowing large values, but also giving positive probability to zero value: half-Cauchy, folded t-distribution. Should be careful with this option since with small data set it may cause an implosive shrinkage.

Another interpretation of imploding shrinkage is: the model has to make choice between assigning more variability to between the groups (\(\sigma_{\alpha}\)) or shrinking the groups and assigning more variability to within the groups noise (\(\sigma_{y}\)).
Whenever it is possible the model will prefer the latter.
Shrinkage will not be very efficient in case of binary predictor.

Sex and Death example of section 19.3 in (Kruschke 2015)

The fruit fly, Drosophila melanogaster, is known for this species that newly inseminated females will not remate (for at least two days), and males will not actively court pregnant females. There were 25 male fruit flies in each of the five groups.

CompanionNumber: group

None0: males with zero companions
Pregnant1: males with one pregnant female
Pregnant8: males accompanied by eight pregnant females
Virgin1: males accompanied by one virgin female
Virgin8: males accompanied by eight virgin females

Research Question: estimate the life spans and the magnitude of differences between groups.

#read the data, the file is available at [K].
dta <- read.csv("data/Fruitfly.csv")
names(dta)

[1] "Longevity"       "CompanionNumber" "Thorax"

datatable(dta)

Traditional analysis

library(tidyverse)
grMeans<-group_by(dta, CompanionNumber) %>%
  summarise(
    count = n(),
    mean = mean(Longevity, na.rm = TRUE),
    sd = sd(Longevity, na.rm = TRUE)
  )
levels(as.factor(dta$CompanionNumber))

[1] "None0"     "Pregnant1" "Pregnant8" "Virgin1"   "Virgin8"

grMeans

# A tibble: 5 × 4
  CompanionNumber count  mean    sd
  <chr>           <int> <dbl> <dbl>
1 None0              25  63.6  16.5
2 Pregnant1          25  64.8  15.7
3 Pregnant8          25  63.4  14.5
4 Virgin1            25  56.8  14.9
5 Virgin8            25  38.7  12.1

# Box plots
# ++++++++++++++++++++
# Plot weight by group and color by group
library("ggpubr")
ggboxplot(dta, x = "CompanionNumber", y = "Longevity", 
          color = "CompanionNumber", 
          order = c("None0", "Pregnant1", "Pregnant8", "Virgin1", "Virgin8"),
          ylab = "Longevity", xlab = "Companion Number")

ggline(dta, x = "CompanionNumber", y = "Longevity", 
       add = c("mean_se", "jitter"), 
       order = c("None0", "Pregnant1", "Pregnant8", "Virgin1", "Virgin8"),
       ylab = "Longevity", xlab = "Companion Number")

# Compute the analysis of variance
longevity.aov <- aov(Longevity ~ CompanionNumber, data = dta)
# Summary of the analysis
summary(longevity.aov)

                 Df Sum Sq Mean Sq F value   Pr(>F)    
CompanionNumber   4  11939  2984.8   13.61 3.52e-09 ***
Residuals       120  26314   219.3                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpret the result of one-way ANOVA tests:

As the p-value is less than the significance level \(0.05\), we can conclude that there are significant differences between the groups highlighted with “***” in the model summary.

Tukey multiple pairwise-comparisons

As the ANOVA test is significant, we can compute Tukey HSD (Tukey Honest Significant Differences, R function: TukeyHSD()) for performing multiple pairwise-comparison between the means of groups.

The function TukeyHD() takes the fitted ANOVA as an argument.

TukeyHSD(longevity.aov)

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = Longevity ~ CompanionNumber, data = dta)

$CompanionNumber
                      diff       lwr        upr     p adj
Pregnant1-None0       1.24 -10.36047  12.840468 0.9983034
Pregnant8-None0      -0.20 -11.80047  11.400468 0.9999988
Virgin1-None0        -6.80 -18.40047   4.800468 0.4854206
Virgin8-None0       -24.84 -36.44047 -13.239532 0.0000003
Pregnant8-Pregnant1  -1.44 -13.04047  10.160468 0.9969591
Virgin1-Pregnant1    -8.04 -19.64047   3.560468 0.3126549
Virgin8-Pregnant1   -26.08 -37.68047 -14.479532 0.0000001
Virgin1-Pregnant8    -6.60 -18.20047   5.000468 0.5157692
Virgin8-Pregnant8   -24.64 -36.24047 -13.039532 0.0000004
Virgin8-Virgin1     -18.04 -29.64047  -6.439532 0.0003240

Check ANOVA assumptions: test validity?

# 1. Homogeneity of variances
plot(longevity.aov, 1)

# 2. Normality
plot(longevity.aov, 2)

# Extract the residuals
aov_residuals <- residuals(object = longevity.aov )
# Run Shapiro-Wilk test
shapiro.test(x = aov_residuals ) #Shapiro-Wilk test on the ANOVA residuals (W = 0.99, p = 0.4) which finds no indication that normality is violated.


    Shapiro-Wilk normality test

data:  aov_residuals
W = 0.98887, p-value = 0.4088

A Shapiro-Wilk test on the ANOVA residuals (W = 0.99, p = 0.4) which finds no indication that normality is violated.

Bayesian approach

dataList <- list(Ntotal = nrow(dta),
                 y = dta$Longevity,
                 x = as.integer(as.factor(dta$CompanionNumber)),
                 NxLvl = nlevels(as.factor(dta$CompanionNumber)),
                 agammaShRa=unlist(gammaShRaFromModeSD(mode = sd(dta$Longevity)/2,
                                                       sd = 2*sd(dta$Longevity))))

Function gammaShRaFromModeSD() from the book calculates shape and rate parameters of gamma distribution from mode and standard deviation.

#view source code of function gammaShRaFromModeSD()
gammaShRaFromModeSD

function (mode, sd) 
{
    if (mode <= 0) 
        stop("mode must be > 0")
    if (sd <= 0) 
        stop("sd must be > 0")
    rate = (mode + sqrt(mode^2 + 4 * sd^2))/(2 * sd^2)
    shape = 1 + mode * rate
    return(list(shape = shape, rate = rate))
}

(ShRa<-gammaShRaFromModeSD(mode=sd(dta$Longevity)/2,sd=2*sd(dta$Longevity)))

$shape
[1] 1.283196

$rate
[1] 0.03224747

With these parameters standard deviation of slopes is concentrated around a relatively small value 8.7819463, but has fat enough right tail.

xAxis <- seq(from=0.001, to=100, by=1 )
plot(xAxis, dgamma(xAxis, shape = ShRa$shape, rate = ShRa$rate), type='l',
     ylab="Gamma Density",xlab="Sigma_Alpha", main="Prior for Sigma_Alpha: mode=8.78")
abline(v=sd(dta$Longevity)/2)

Prepare the model description.

modelString<-"
data {
    int<lower=1> Ntotal;
    real y[Ntotal];
    int<lower=2> NxLvl;
    int<lower=1, upper=NxLvl> x[Ntotal];
    real<lower=0> agammaShRa[2];
}
transformed data {
    real meanY;
    real sdY;
    meanY = mean(y);
    sdY = sd(y);
}
parameters {
    real a0;
    real<lower=0> aSigma;
    vector[NxLvl] a;
    real<lower=0> ySigma;
}
model {
    a0 ~ normal(meanY, 5*sdY);
    aSigma ~ gamma(agammaShRa[1], agammaShRa[2]);
    a ~ normal(0, aSigma);
    ySigma ~ uniform(sdY/100, sdY*10);
    for ( i in 1:Ntotal ) {
        y[i] ~ normal(a0 + a[x[i]], ySigma);
    }
}
generated quantities {
    // Convert a0,a[] to sum-to-zero b0,b[] :
        real b0;
    vector[NxLvl] b;
    b0 = a0 + mean(a);
    b = a - mean(a);
}"

Run MCMC.

stanDso <- stan_model(model_code = modelString)

If saved DSO is used load it, then run the chains.

# saveRDS(stanDso,file="data/stanDSO.Rds")
stanDso <- readRDS("data/stanDSO.Rds")

fit <- sampling(stanDso, 
                data = dataList, 
                pars = c('b0', 'b', 'aSigma', 'ySigma'),
                iter = 5000, chains = 2, cores = 4)

Analyze the chains using shinystan()

launch_shinystan(fit)

summary(fit)$summary[,c(1,4,6,8,10)]

               mean         2.5%          50%      97.5%      Rhat
b0       57.4393525   54.8635804   57.4343609   60.01657 1.0001866
b[1]      5.7588760    0.5134606    5.7673484   10.97372 0.9999060
b[2]      6.9174882    1.8407142    6.8867440   12.16411 0.9999481
b[3]      5.5581312    0.4278215    5.5290942   10.80727 0.9997391
b[4]     -0.5954606   -5.7719881   -0.5929975    4.69568 1.0000531
b[5]    -17.6390348  -22.7535663  -17.6253508  -12.37578 1.0002923
aSigma   15.4223486    6.3137021   13.3207333   37.62849 1.0021829
ySigma   14.9461756   13.2475061   14.9014775   16.95289 1.0005380
lp__   -409.5860642 -414.8603961 -409.2042069 -406.39237 1.0017972

cbind(GroupMeans=grMeans,
      EstimatedMeans=summary(fit)$summary[2:6,1]+summary(fit)$summary[1,1])

     GroupMeans.CompanionNumber GroupMeans.count GroupMeans.mean
b[1]                      None0               25           63.56
b[2]                  Pregnant1               25           64.80
b[3]                  Pregnant8               25           63.36
b[4]                    Virgin1               25           56.76
b[5]                    Virgin8               25           38.72
     GroupMeans.sd EstimatedMeans
b[1]      16.45215       63.19823
b[2]      15.65248       64.35684
b[3]      14.53983       62.99748
b[4]      14.92838       56.84389
b[5]      12.10207       39.80032

Note differences between estimated group means and also shrinkage.

stan_ac(fit, separate_chains = T)

pairs(fit)

plot(fit)

plot(fit,pars=c("b"))

stan_dens(fit)

From MCMC we see immediately that not all parameters \(\beta_j\) are zeros (omnibus test).
As usual, the following question in ANOVA is: which group means are not equal.
To answer this question we use contrasts.

In FNP approach there is a constraint on how many pairs or subgroups can be checked with contrasts with desired accuracy of inference.

In Bayesian approach testing of contrasts is straight forward and does not have restrictions: each step of chain returns a combination of group means. Thus we only need to compare frequency of these combinations in the chain.

Calculate contrasts for betas.

Extract chains for parameters b.

fit_ext <- rstan::extract(fit)
head(fit_ext$b)

          
iterations     [,1]       [,2]     [,3]       [,4]      [,5]
      [1,] 7.477280  3.4507237 6.191515 -4.0575910 -13.06193
      [2,] 3.379211 12.8289133 5.852268  0.1562871 -22.21668
      [3,] 7.777005  0.5933093 6.875631  0.7496060 -15.99555
      [4,] 6.199593  4.9381578 5.802773  2.4810871 -19.42161
      [5,] 8.107023  4.9390397 6.255589 -1.3545609 -17.94709
      [6,] 5.496331  7.7539725 9.307447 -5.7502948 -16.80746

dim(fit_ext$b)

[1] 5000    5

Contrast between group 1 (‘None0’) and group 5 (‘Virgin8’)

contrast_1_5 <- fit_ext$b[,1] - fit_ext$b[,5]
plot(contrast_1_5)

hist(contrast_1_5)

(hdiContrast_1_5 <- hdi(contrast_1_5))

   lower    upper 
15.21982 31.95605 
attr(,"credMass")
[1] 0.95

(sd.contrast_1_5 <- sd(contrast_1_5))

[1] 4.256989

plot(rank(fit_ext$b[,1]), rank(fit_ext$b[,5]))

Contrast between average of groups 1-3 (None0, Pregnant1, Pregnant8) and average of groups 4-5 (Virgin1, Virgin8)

Comb1<-(fit_ext$b[,1] + fit_ext$b[,2] + fit_ext$b[,3])/3
Comb2<-(fit_ext$b[,4] + fit_ext$b[,5])/2
contrast_123_45 <- Comb1 - Comb2
plot(contrast_123_45)

hist(contrast_123_45)

(hdiContrast_123_45<-hdi(contrast_123_45))

    lower     upper 
 9.713508 20.416612 
attr(,"credMass")
[1] 0.95

(sd.contrast_123_45<-sd(contrast_123_45))

[1] 2.734301

head(cbind(Comb1,Comb2))

        Comb1      Comb2
[1,] 5.706506  -8.559759
[2,] 7.353464 -11.030196
[3,] 5.081982  -7.622973
[4,] 5.646841  -8.470262
[5,] 6.433884  -9.650826
[6,] 7.519250 -11.278875

plot(Comb1[1:100],Comb2[1:100])

plot(rank((fit_ext$b[,1] + fit_ext$b[,2] + fit_ext$b[,3])/3),rank((fit_ext$b[,4] + fit_ext$b[,5])/2))

Two-way ANOVA

Model description

Define the model

modelString<-"
data {
    int<lower=1> Ntotal;
    vector[Ntotal] y;
    int<lower=2> Nx1Lvl;
    int<lower=2> Nx2Lvl;
    int<lower=1, upper=Nx1Lvl> x1[Ntotal];
    int<lower=1, upper=Nx2Lvl> x2[Ntotal];
    real<lower=0> agammaShRa[2];
}
transformed data {
    real meanY;
    real sdY;
    vector[Ntotal] zy;
    meanY = mean(y);
    sdY = sd(y);
    zy = (y - mean(y)) / sdY;  // center & normalize
}
parameters {
    real a0;
    real<lower=0> a1Sigma;
    real<lower=0> a2Sigma;
    real<lower=0> a1a2Sigma;
    vector[Nx1Lvl] a1;
    vector[Nx2Lvl] a2;
    matrix[Nx1Lvl,Nx2Lvl] a1a2;
    real<lower=0> zySigma;
}
model {
    a0 ~ normal(0, 1);
    a1Sigma ~ gamma(agammaShRa[1], agammaShRa[2]);
    a1 ~ normal(0, a1Sigma);
    a2Sigma ~ gamma(agammaShRa[1], agammaShRa[2]);
    a2 ~ normal(0, a2Sigma);
    a1a2Sigma ~ gamma(agammaShRa[1], agammaShRa[2]);
    for (j1 in 1:Nx1Lvl) {
        a1a2[j1,] ~ normal(0, a1a2Sigma);
    }
    zySigma ~ uniform(1.0/10, 10);
    for ( i in 1:Ntotal ) {
        zy[i] ~ normal(a0 + a1[x1[i]] + a2[x2[i]]+ a1a2[x1[i],x2[i]], zySigma);
    }
}
generated quantities {
    // Convert a to sum-to-zero b :
    real b0;
    vector[Nx1Lvl] b1;
    vector[Nx2Lvl] b2;
    matrix[Nx1Lvl,Nx2Lvl] b1b2;
    matrix[Nx1Lvl,Nx2Lvl] m;
    real<lower=0> b1Sigma;
    real<lower=0> b2Sigma;
    real<lower=0> b1b2Sigma;
    real<lower=0> ySigma;
    for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) {
        m[j1,j2] = a0 + a1[j1] + a2[j2] + a1a2[j1,j2]; // cell means 
    } }
    b0 = mean(m);
    for ( j1 in 1:Nx1Lvl ) { b1[j1] = mean( m[j1,] ) - b0; }
    for ( j2 in 1:Nx2Lvl ) { b2[j2] = mean( m[,j2] ) - b0; }
    for ( j1 in 1:Nx1Lvl ) { for ( j2 in 1:Nx2Lvl ) {
        b1b2[j1,j2] = m[j1,j2] - ( b0 + b1[j1] + b2[j2] );  
    } }
    // transform to original scale:
    b0 = meanY + sdY * b0;
    b1 = sdY * b1;
    b2 = sdY * b2;
    b1b2 = sdY * b1b2;
    b1Sigma = sdY * a1Sigma;
    b2Sigma = sdY * a2Sigma;
    b1b2Sigma = sdY * a1a2Sigma;
    ySigma = sdY * zySigma;
}"

#Create DSO.
stanDsoANOVA2Way <- stan_model(model_code=modelString)

#If saved DSO is used load it, then run the chains.
# saveRDS(stanDsoANOVA2Way,file="data/stanDso2WayANOVA.Rds")
stanDsoANOVA2Way <- readRDS(file="data/stanDso2WayANOVA.Rds")

Data of section 20.1 in (Kruschke 2015)

Read the data.

The data show faculty salaries of a university.

The columns selected as predictors are:

Position (Assistant Professor, Associate Professor, Full Professor, Full Professor with endowment salary and Distinguished Professor) Department (total 60 departments)

# 20_1: Metric Predicted Variable with Two Nominal Predictors

# load data from 'Salary.csv' (see Kruschke)
df <- read.csv("data/Salary.csv")
mean(df$Salary)

[1] 110452

tail(df)

      Org                                  OrgName Cla Pos ClaPos
1075 OADT Business Operations & Decision Technolog  PT FT3 PT.FT3
1076 CSCI                         Computer Science  PT FT3 PT.FT3
1077 PHYS                                  Physics  PT FT3 PT.FT3
1078 GEOL                      Geological Sciences  PR FT1 PR.FT1
1079   BI                                  Biology  PR FT1 PR.FT1
1080 MUST           Music School-String Technology  PR FT1 PR.FT1
     Salary
1075 147400
1076  92500
1077  76748
1078 105786
1079 116592
1080 138022

dim(df)

[1] 1080    6

names(df)

[1] "Org"     "OrgName" "Cla"     "Pos"     "ClaPos"  "Salary"

with(df, table(Pos))

Pos
DST FT1 FT2 FT3 NDW 
 32 360 364 240  84

with(df, table(Org))

Org
ACTG AFRO AMST ANTH APHS  AST BEPP BFIN   BI BLAN CEDP CEUS CHEM CMCL 
  19    7    7   23   18    7    9   21   51    7   25   11   35   17 
CMLT CRIN CSCI EALC ECON ELPS  ENG FINH FINS FOLK FRIT GEOG GEOL GERM 
   8   23   30   14   20   16   39    9   14   11   14    8   19    9 
HIST INFO JOUR KINE LAWS LGED LING MATH MGMT MKTG MUHI MUIN MUST MUTH 
  35   20   14   15   37   11    8   40   12   13    9   38   14    8 
MUVO OADT  OPT PHYS   PL POLS  PSY  REL RPAD SLIS  SLS  SOC SPAN SPEA 
  10   14   11   32   14   22   41   13   10   13    7   20   21   40 
SPHS STAT TELC THTR 
  15    7   12   13

length(table(df$Org))

[1] 60

y <- df$Salary
x1 <- as.factor(df$Pos)
x2 <- as.factor(df$Org)
str(df)

'data.frame':   1080 obs. of  6 variables:
 $ Org    : chr  "PL" "MUTH" "ENG" "CMLT" ...
 $ OrgName: chr  "Philosophy" "Music Theory" "English" "Comparative Literature" ...
 $ Cla    : chr  "PC" "PC" "PC" "PC" ...
 $ Pos    : chr  "FT2" "FT2" "FT2" "FT2" ...
 $ ClaPos : chr  "PC.FT2" "PC.FT2" "PC.FT2" "PC.FT2" ...
 $ Salary : int  72395 61017 82370 68805 63796 219600 98814 107745 114275 173302 ...

dataListSalary <- list(Ntotal=length(y),
                       y=y,
                       x1=as.integer(x1),
                       x2=as.integer(x2),
                       Nx1Lvl=nlevels(x1),
                       Nx2Lvl=nlevels(x2),
                       agammaShRa=unlist(gammaShRaFromModeSD(mode=1/2, sd=2)))

Create names of variables and their interactions

(namesPos<-names(table(df$Pos)))

[1] "DST" "FT1" "FT2" "FT3" "NDW"

(namesOrg<-names(table(df$Org)))

 [1] "ACTG" "AFRO" "AMST" "ANTH" "APHS" "AST"  "BEPP" "BFIN" "BI"  
[10] "BLAN" "CEDP" "CEUS" "CHEM" "CMCL" "CMLT" "CRIN" "CSCI" "EALC"
[19] "ECON" "ELPS" "ENG"  "FINH" "FINS" "FOLK" "FRIT" "GEOG" "GEOL"
[28] "GERM" "HIST" "INFO" "JOUR" "KINE" "LAWS" "LGED" "LING" "MATH"
[37] "MGMT" "MKTG" "MUHI" "MUIN" "MUST" "MUTH" "MUVO" "OADT" "OPT" 
[46] "PHYS" "PL"   "POLS" "PSY"  "REL"  "RPAD" "SLIS" "SLS"  "SOC" 
[55] "SPAN" "SPEA" "SPHS" "STAT" "TELC" "THTR"

Interactions names:

#use outer() to create names for interactions.
as.vector(outer(1:4,1:2,paste,sep="-"))

[1] "1-1" "2-1" "3-1" "4-1" "1-2" "2-2" "3-2" "4-2"

#apply to our case
(namesInter<-as.vector(outer(namesOrg,namesPos,paste,sep="-")))

  [1] "ACTG-DST" "AFRO-DST" "AMST-DST" "ANTH-DST" "APHS-DST"
  [6] "AST-DST"  "BEPP-DST" "BFIN-DST" "BI-DST"   "BLAN-DST"
 [11] "CEDP-DST" "CEUS-DST" "CHEM-DST" "CMCL-DST" "CMLT-DST"
 [16] "CRIN-DST" "CSCI-DST" "EALC-DST" "ECON-DST" "ELPS-DST"
 [21] "ENG-DST"  "FINH-DST" "FINS-DST" "FOLK-DST" "FRIT-DST"
 [26] "GEOG-DST" "GEOL-DST" "GERM-DST" "HIST-DST" "INFO-DST"
 [31] "JOUR-DST" "KINE-DST" "LAWS-DST" "LGED-DST" "LING-DST"
 [36] "MATH-DST" "MGMT-DST" "MKTG-DST" "MUHI-DST" "MUIN-DST"
 [41] "MUST-DST" "MUTH-DST" "MUVO-DST" "OADT-DST" "OPT-DST" 
 [46] "PHYS-DST" "PL-DST"   "POLS-DST" "PSY-DST"  "REL-DST" 
 [51] "RPAD-DST" "SLIS-DST" "SLS-DST"  "SOC-DST"  "SPAN-DST"
 [56] "SPEA-DST" "SPHS-DST" "STAT-DST" "TELC-DST" "THTR-DST"
 [61] "ACTG-FT1" "AFRO-FT1" "AMST-FT1" "ANTH-FT1" "APHS-FT1"
 [66] "AST-FT1"  "BEPP-FT1" "BFIN-FT1" "BI-FT1"   "BLAN-FT1"
 [71] "CEDP-FT1" "CEUS-FT1" "CHEM-FT1" "CMCL-FT1" "CMLT-FT1"
 [76] "CRIN-FT1" "CSCI-FT1" "EALC-FT1" "ECON-FT1" "ELPS-FT1"
 [81] "ENG-FT1"  "FINH-FT1" "FINS-FT1" "FOLK-FT1" "FRIT-FT1"
 [86] "GEOG-FT1" "GEOL-FT1" "GERM-FT1" "HIST-FT1" "INFO-FT1"
 [91] "JOUR-FT1" "KINE-FT1" "LAWS-FT1" "LGED-FT1" "LING-FT1"
 [96] "MATH-FT1" "MGMT-FT1" "MKTG-FT1" "MUHI-FT1" "MUIN-FT1"
[101] "MUST-FT1" "MUTH-FT1" "MUVO-FT1" "OADT-FT1" "OPT-FT1" 
[106] "PHYS-FT1" "PL-FT1"   "POLS-FT1" "PSY-FT1"  "REL-FT1" 
[111] "RPAD-FT1" "SLIS-FT1" "SLS-FT1"  "SOC-FT1"  "SPAN-FT1"
[116] "SPEA-FT1" "SPHS-FT1" "STAT-FT1" "TELC-FT1" "THTR-FT1"
[121] "ACTG-FT2" "AFRO-FT2" "AMST-FT2" "ANTH-FT2" "APHS-FT2"
[126] "AST-FT2"  "BEPP-FT2" "BFIN-FT2" "BI-FT2"   "BLAN-FT2"
[131] "CEDP-FT2" "CEUS-FT2" "CHEM-FT2" "CMCL-FT2" "CMLT-FT2"
[136] "CRIN-FT2" "CSCI-FT2" "EALC-FT2" "ECON-FT2" "ELPS-FT2"
[141] "ENG-FT2"  "FINH-FT2" "FINS-FT2" "FOLK-FT2" "FRIT-FT2"
[146] "GEOG-FT2" "GEOL-FT2" "GERM-FT2" "HIST-FT2" "INFO-FT2"
[151] "JOUR-FT2" "KINE-FT2" "LAWS-FT2" "LGED-FT2" "LING-FT2"
[156] "MATH-FT2" "MGMT-FT2" "MKTG-FT2" "MUHI-FT2" "MUIN-FT2"
[161] "MUST-FT2" "MUTH-FT2" "MUVO-FT2" "OADT-FT2" "OPT-FT2" 
[166] "PHYS-FT2" "PL-FT2"   "POLS-FT2" "PSY-FT2"  "REL-FT2" 
[171] "RPAD-FT2" "SLIS-FT2" "SLS-FT2"  "SOC-FT2"  "SPAN-FT2"
[176] "SPEA-FT2" "SPHS-FT2" "STAT-FT2" "TELC-FT2" "THTR-FT2"
[181] "ACTG-FT3" "AFRO-FT3" "AMST-FT3" "ANTH-FT3" "APHS-FT3"
[186] "AST-FT3"  "BEPP-FT3" "BFIN-FT3" "BI-FT3"   "BLAN-FT3"
[191] "CEDP-FT3" "CEUS-FT3" "CHEM-FT3" "CMCL-FT3" "CMLT-FT3"
[196] "CRIN-FT3" "CSCI-FT3" "EALC-FT3" "ECON-FT3" "ELPS-FT3"
[201] "ENG-FT3"  "FINH-FT3" "FINS-FT3" "FOLK-FT3" "FRIT-FT3"
[206] "GEOG-FT3" "GEOL-FT3" "GERM-FT3" "HIST-FT3" "INFO-FT3"
[211] "JOUR-FT3" "KINE-FT3" "LAWS-FT3" "LGED-FT3" "LING-FT3"
[216] "MATH-FT3" "MGMT-FT3" "MKTG-FT3" "MUHI-FT3" "MUIN-FT3"
[221] "MUST-FT3" "MUTH-FT3" "MUVO-FT3" "OADT-FT3" "OPT-FT3" 
[226] "PHYS-FT3" "PL-FT3"   "POLS-FT3" "PSY-FT3"  "REL-FT3" 
[231] "RPAD-FT3" "SLIS-FT3" "SLS-FT3"  "SOC-FT3"  "SPAN-FT3"
[236] "SPEA-FT3" "SPHS-FT3" "STAT-FT3" "TELC-FT3" "THTR-FT3"
[241] "ACTG-NDW" "AFRO-NDW" "AMST-NDW" "ANTH-NDW" "APHS-NDW"
[246] "AST-NDW"  "BEPP-NDW" "BFIN-NDW" "BI-NDW"   "BLAN-NDW"
[251] "CEDP-NDW" "CEUS-NDW" "CHEM-NDW" "CMCL-NDW" "CMLT-NDW"
[256] "CRIN-NDW" "CSCI-NDW" "EALC-NDW" "ECON-NDW" "ELPS-NDW"
[261] "ENG-NDW"  "FINH-NDW" "FINS-NDW" "FOLK-NDW" "FRIT-NDW"
[266] "GEOG-NDW" "GEOL-NDW" "GERM-NDW" "HIST-NDW" "INFO-NDW"
[271] "JOUR-NDW" "KINE-NDW" "LAWS-NDW" "LGED-NDW" "LING-NDW"
[276] "MATH-NDW" "MGMT-NDW" "MKTG-NDW" "MUHI-NDW" "MUIN-NDW"
[281] "MUST-NDW" "MUTH-NDW" "MUVO-NDW" "OADT-NDW" "OPT-NDW" 
[286] "PHYS-NDW" "PL-NDW"   "POLS-NDW" "PSY-NDW"  "REL-NDW" 
[291] "RPAD-NDW" "SLIS-NDW" "SLS-NDW"  "SOC-NDW"  "SPAN-NDW"
[296] "SPEA-NDW" "SPHS-NDW" "STAT-NDW" "TELC-NDW" "THTR-NDW"

#all names:
varNames<-c("Intercept", namesPos, namesOrg, namesInter, rep("Var",5)) #why need to rep Var 5 times

MCMC

# fit model
fit <- sampling(stanDsoANOVA2Way, 
                data=dataListSalary, 
                pars=c('b0',
                       'b1', 
                       'b2', 
                       'b1b2',
                       'b1Sigma', 
                       'b2Sigma',
                       'b1b2Sigma',
                       'ySigma'),
                iter=5000, chains = 2, cores = 4)

# save(fit, file = "data/fitinstan1.Rdata")
load("data/fitinstan1.Rdata")

launch_shinystan(fit)

Create results including mean value, 2.5%, 50% and 97.5% quantiles. Add variable names as row names.

SalaryResults <- summary(fit)$summary[,c(1,4,6,8)]
varNames[nrow(SalaryResults)-(4:0)] <- rownames(SalaryResults)[nrow(SalaryResults)-(4:0)]
rownames(SalaryResults) <- varNames
head(SalaryResults)

                mean       2.5%        50%       97.5%
Intercept 127090.073 124721.828 127081.992 129373.0577
DST        55548.126  48387.941  55571.959  62569.1969
FT1        -3101.583  -5949.631  -3126.253   -148.3999
FT2       -33047.754 -35838.531 -33046.367 -30203.8486
FT3       -46367.618 -49353.221 -46385.131 -43252.7522
NDW        26968.829  22202.907  26981.535  31482.3894

Make plots of mean values and HDIs.

plot(fit,pars=c("b1"))

plot(fit,pars=c('b2'))

plot(fit,pars=c("b1b2"))

Plots show that not all coefficients of the model equal zero. This answers the question of utility test.

Working with chains and contrasts

Extract chains for the position variables.

fit_ext <- rstan::extract(fit)
names(fit_ext)

[1] "b0"        "b1"        "b2"        "b1b2"      "b1Sigma"  
[6] "b2Sigma"   "b1b2Sigma" "ySigma"    "lp__"

fit_ext.b0<-fit_ext$b0
fit_ext.b1<-fit_ext$b1
colnames(fit_ext.b1) <- namesPos
head(fit_ext.b1)

          
iterations      DST       FT1       FT2       FT3      NDW
      [1,] 53643.51 -2464.063 -32308.67 -48308.55 29437.76
      [2,] 58738.28 -1706.306 -33827.16 -46158.53 22953.73
      [3,] 55505.48 -3764.629 -31670.52 -44799.95 24729.62
      [4,] 53497.58 -4276.004 -31554.91 -48626.05 30959.38
      [5,] 57104.01 -2315.971 -34761.95 -48452.81 28426.72
      [6,] 57498.84 -2508.874 -33852.21 -47274.01 26136.25

Extract chains for the department variables.

fit_ext.b2 <- fit_ext$b2
colnames(fit_ext.b2) <- namesOrg
head(fit_ext.b2)

          
iterations     ACTG       AFRO        AMST       ANTH        APHS
      [1,] 84518.72 -10683.803 -31817.1077  -4720.261   3396.7104
      [2,] 88156.40 -12387.445    836.3933 -20495.377 -11482.3619
      [3,] 81611.69 -13618.307 -28257.8104 -18097.996  11482.2086
      [4,] 79194.53 -21153.965 -15077.6255 -21260.863   -125.4459
      [5,] 85622.17 -25296.108 -18530.4846 -20764.400    564.3309
      [6,] 82473.44  -3693.756 -14717.9823 -22886.365   6663.3884
          
iterations        AST     BEPP      BFIN        BI     BLAN      CEDP
      [1,]  -272.2711 56966.20 113408.94 -6397.807 24797.14 -23240.80
      [2,]  8032.0282 50815.51 108789.31 -2003.753 23891.55 -22317.59
      [3,] -4209.3496 54267.00 105146.37 -9549.098 14526.17 -11514.01
      [4,]  8876.8401 30764.92  99360.17  2224.069 15125.27 -20714.19
      [5,]  8044.0916 46878.48 118933.83 -2619.729 10774.34 -13026.26
      [6,]  7926.3391 46454.59 109490.48 -2284.006 24635.29 -21256.39
          
iterations      CEUS     CHEM       CMCL      CMLT      CRIN
      [1,] -21339.84 21278.46 -16396.336 -13661.83 -28726.73
      [2,] -18465.13 14471.89   2961.902 -37086.56 -29609.02
      [3,] -19574.64 23867.96 -14992.093 -22798.07 -20194.33
      [4,] -25334.82 21066.61 -12840.202 -21092.47 -17214.27
      [5,] -20353.06 20205.76 -18418.019 -29499.71 -23319.70
      [6,] -19667.90 14114.09  -6386.052 -16853.90 -14670.81
          
iterations      CSCI       EALC     ECON        ELPS       ENG
      [1,] 19070.861 -22627.064 53690.70  -2256.9313 -23618.76
      [2,]  6341.859 -16326.435 63001.24  -6692.3318 -24588.41
      [3,]  5777.751 -24292.943 60270.28 -11446.9859 -22431.50
      [4,] 19402.512 -25074.484 56716.46   -865.0668 -21256.75
      [5,] 13926.101 -11248.245 45142.49  -5962.1880 -19477.31
      [6,] 14513.160  -3620.939 57299.77  -7540.6420 -12163.67
          
iterations      FINH      FINS      FOLK       FRIT       GEOG
      [1,] -14346.30 -18209.02 -29875.32 -21655.118  -5639.289
      [2,] -15715.47 -20946.13 -23147.34 -23395.993 -16202.811
      [3,] -26414.60 -22185.23 -17332.30 -14807.023 -18898.645
      [4,] -13383.51 -21330.12 -15269.66 -16003.793  -9272.097
      [5,] -23487.76 -14402.51 -20000.94 -15901.330  -5976.184
      [6,] -16291.29 -24708.77 -21488.15  -2759.774 -15596.673
          
iterations       GEOL      GERM       HIST      INFO       JOUR
      [1,] -22550.037 -23929.79 -11922.908 14948.452 -24063.132
      [2,]  -7012.096 -51561.03  -9361.597  6416.838 -26205.125
      [3,] -10887.379 -32162.35  -9143.212  4342.127  -9072.310
      [4,] -16233.336 -23952.16  -9301.820 15654.625 -23874.915
      [5,] -21942.573 -35140.01 -15107.426 15316.580 -15393.027
      [6,]  -4562.365 -30880.93 -16131.613  7988.985  -9296.838
          
iterations       KINE     LAWS      LGED       LING        MATH
      [1,]  -9193.225 38785.02 -17528.29 -17190.155 -8225.30392
      [2,]  -6282.942 47666.25 -17899.19 -13565.903 -7706.18824
      [3,]  -9910.339 45848.64 -27964.54  -6617.547 -1768.38643
      [4,] -11922.797 33108.61 -18500.86   2572.191 -3658.10736
      [5,]  -2202.068 39704.44 -14898.99 -15872.538 -8693.52608
      [6,]  -5544.008 38161.16 -33744.68  -9544.743   -85.51853
          
iterations     MGMT     MKTG      MUHI       MUIN       MUST
      [1,] 62113.34 64268.19 -32968.88 -20481.139  -844.2042
      [2,] 62136.44 67642.69 -27573.69 -18587.600 -6443.1771
      [3,] 74127.78 71343.24 -34818.65  -5149.272  -998.3260
      [4,] 59756.08 62986.43 -24386.81 -27655.967  7488.0542
      [5,] 63347.38 58431.11 -27734.24 -23368.999  7810.1252
      [6,] 47017.26 69201.49 -35955.53 -19428.976 -6371.6317
          
iterations      MUTH      MUVO     OADT       OPT       PHYS
      [1,] -27687.26 -29106.00 53817.90 -13708.67   938.1202
      [2,] -37874.35 -31662.78 57211.89 -17478.04 -5738.0122
      [3,] -28434.64 -33955.67 60768.24  -4218.98  3772.0028
      [4,] -41578.53 -26331.99 52976.79 -21015.56  6144.4542
      [5,] -21352.16 -24057.30 55091.18 -14208.71 -3552.6289
      [6,] -35308.73 -39279.62 54524.81 -13330.94 -2781.7758
          
iterations         PL      POLS       PSY        REL       RPAD
      [1,] -11753.736  3513.843  7017.447  -6453.305  -3187.879
      [2,]  -8796.469 15546.737  8450.865 -14424.987 -12159.452
      [3,]  -9792.748  5648.549  9457.655 -12918.854  -4778.571
      [4,]   3873.790  7563.929  3148.918 -14619.568  -6131.743
      [5,] -19536.730 10201.619 10906.627  -1860.096  -2935.141
      [6,]  -2805.521  4428.878  9198.403 -12073.818  -1760.722
          
iterations       SLIS        SLS        SOC      SPAN     SPEA
      [1,] 12208.8805 -17676.380 -14610.381 -26932.70 12609.29
      [2,]  3209.0633   -152.842 -10660.845 -11541.33 21244.00
      [3,]  6337.0940 -22995.180  -3894.261 -28734.78 13156.72
      [4,]  6231.9672 -17295.152  -5021.769 -16128.08 15097.28
      [5,]   849.9127 -21943.215  -5528.208 -10620.69 12476.60
      [6,]  8901.4092 -26578.505  -5382.470 -21548.77 16643.46
          
iterations      SPHS        STAT       TELC      THTR
      [1,] 12925.959 10388.95421  -2909.144 -32256.04
      [2,]  8928.522  6829.27323 -17467.685 -11563.16
      [3,]  2515.913   963.79821  -8621.868 -27778.40
      [4,]  2017.494  -229.95943  -7111.211 -19132.31
      [5,]  2956.340    79.86284 -12122.435 -20908.74
      [6,]  8378.987 -8767.91555 -27142.360 -33120.35

Extract chains for interaction variables.

fit_ext.b1.b2<-fit_ext$b1b2
dim(fit_ext.b1.b2)

[1] 5000    5   60

Interaction chains make a cube.

dimnames(fit_ext.b1.b2)[[2]]<-namesPos
dimnames(fit_ext.b1.b2)[[3]]<-namesOrg
dimnames(fit_ext.b1.b2)

$iterations
NULL

[[2]]
[1] "DST" "FT1" "FT2" "FT3" "NDW"

[[3]]
 [1] "ACTG" "AFRO" "AMST" "ANTH" "APHS" "AST"  "BEPP" "BFIN" "BI"  
[10] "BLAN" "CEDP" "CEUS" "CHEM" "CMCL" "CMLT" "CRIN" "CSCI" "EALC"
[19] "ECON" "ELPS" "ENG"  "FINH" "FINS" "FOLK" "FRIT" "GEOG" "GEOL"
[28] "GERM" "HIST" "INFO" "JOUR" "KINE" "LAWS" "LGED" "LING" "MATH"
[37] "MGMT" "MKTG" "MUHI" "MUIN" "MUST" "MUTH" "MUVO" "OADT" "OPT" 
[46] "PHYS" "PL"   "POLS" "PSY"  "REL"  "RPAD" "SLIS" "SLS"  "SOC" 
[55] "SPAN" "SPEA" "SPHS" "STAT" "TELC" "THTR"

fit_ext.b1.b2[1,,]

     
            ACTG      AFRO      AMST      ANTH        APHS        AST
  DST   3539.479  6393.899  1304.058 15763.611 -10799.0416   7927.158
  FT1 -16326.820 -7421.503 -6828.542 -9787.681  11662.4623  -2611.558
  FT2   9626.651 -4519.507  3486.851 -9088.313  -2036.0353   2425.408
  FT3   8239.038 -2385.126 10078.649  3483.160   -975.3569 -13603.240
  NDW  -5078.347  7932.237 -8041.017  -370.778   2147.9715   5862.232
     
           BEPP        BFIN        BI       BLAN       CEDP      CEUS
  DST  8728.384  -2330.6655 -2834.672 -5466.1952 -15580.544 -2080.820
  FT1 -3538.370    976.2805  4358.263 -2541.3345  -1900.758  1838.315
  FT2  4630.186  12267.9786 -5681.101  -673.3443   4958.960  4207.809
  FT3 -2312.500   1773.3809 12940.011  1518.5936   9637.629  5874.503
  NDW -7507.699 -12686.9744 -8782.501  7162.2803   2884.713 -9839.807
     
            CHEM      CMCL         CMLT      CRIN       CSCI
  DST   3002.086 -9440.742    -86.93006 -9140.478  3129.8086
  FT1  11524.040 14411.093  -2557.61581 -3161.343 -2309.5690
  FT2  -5293.531 -2668.087 -16853.21654  6077.927 -7840.4627
  FT3 -22330.841 -7414.901   7430.17731 10001.465   422.3283
  NDW  13098.245  5112.637  12067.58510 -3777.572  6597.8948
     
           EALC        ECON       ELPS         ENG       FINH
  DST -4167.890   -258.8566  -721.5942   -692.6053 -4837.1791
  FT1  7427.198   5254.3375  3062.8163   2486.2253 -9159.9057
  FT2  6582.357  12117.9070 -4679.8036   5590.4990 17244.9910
  FT3 -3181.338 -22080.7613 -2932.5083   4007.1760 -2496.7910
  NDW -6660.328   4967.3734  5271.0899 -11391.2951  -751.1152
     
            FINS      FOLK       FRIT       GEOG        GEOL
  DST   325.8752 -3893.956  -6294.419  6675.0396   5774.6693
  FT1   535.5009 -7072.318  17389.175  7479.6932  12352.3715
  FT2  -328.2188  4064.460 -15657.924 -6928.5509   2648.6372
  FT3 -7741.0741  9117.089   6343.980  -701.5859    466.6109
  NDW  7207.9169 -2215.275  -1780.812 -6524.5961 -21242.2889
     
           GERM      HIST        INFO      JOUR       KINE
  DST  4706.445 -8527.565   3237.7335 -1757.081 -10582.306
  FT1 -2672.307 -2134.152   4940.7638 -3147.530  14339.188
  FT2 -5330.064 -3001.392 -12774.4001  4272.784  -3333.583
  FT3  9323.907 -5018.605   5237.0730  9447.929   7535.690
  NDW -6027.980 18681.715   -641.1703 -8816.101  -7958.989
     
             LAWS        LGED       LING       MATH      MGMT
  DST -10015.5598  4591.43556  -7204.124 -11581.540 -3716.644
  FT1   3412.6992  -587.59280   5405.700  -9752.037 -1804.589
  FT2   6352.9234  -399.63708 -14028.469   4526.424  7716.997
  FT3   -899.5615   -10.67023  -4023.939  12426.506 -5186.786
  NDW   1149.4988 -3593.53544  19850.832   4380.648  2991.022
     
             MKTG      MUHI       MUIN        MUST      MUTH
  DST  12502.3899 -7724.675   3129.524   7886.1204 -9039.427
  FT1 -16194.6842  3408.667  -4813.088   5061.4779 12022.885
  FT2  -2920.6533 -3107.146   3336.682  -2460.8449 -1193.335
  FT3    768.3334  6316.702   9569.930 -10862.8928 -6529.798
  NDW   5844.6142  1106.452 -11223.048    376.1394  4739.675
     
           MUVO        OADT        OPT       PHYS        PL      POLS
  DST -9946.788   -888.6629   1872.653  -3744.574  3435.746 -6711.089
  FT1  8902.435 -10644.0479 -12866.850   5992.883  1352.037  1818.654
  FT2 -7975.997  17648.2569   5249.675  14873.035 -9887.859 -4523.008
  FT3 12665.367   3657.3418  -3987.251 -13554.341 12865.421 -2919.634
  NDW -3645.017  -9772.8880   9731.773  -3567.002 -7765.346 12335.077
     
              PSY        REL       RPAD       SLIS       SLS
  DST  11700.3240 -6837.7256   9313.103  15054.896  8808.297
  FT1 -15963.5566  -688.5178 -10034.632   9249.393 -3786.320
  FT2 -12322.9888 -4672.9437   7567.516 -11362.755 -6045.782
  FT3    525.9758 -7990.0302 -13929.224  -9658.127  1906.952
  NDW  16060.2456 20189.2173   7083.237  -3283.406  -883.147
     
            SOC          SPAN       SPEA       SPHS       STAT
  DST -6806.699   1782.323470  12803.000  13314.603  2412.6612
  FT1 -5621.186   1570.915735   4660.468  -6725.916  1597.2451
  FT2  2436.551   9372.847478   2481.490  -6071.677   755.1363
  FT3  1224.220      4.001735   1103.587 -10906.537 -4197.6978
  NDW  8767.113 -12730.088418 -21048.545  10389.527  -567.3448
     
           TELC       THTR
  DST -5399.985   9995.711
  FT1 -7969.306   6130.450
  FT2  3291.536   7848.155
  FT3 -2627.908   4546.298
  NDW 12705.663 -28520.613

Rows correspond to iterations of MCMC.
For each iteration there is interactions table between schools and positions.

Salary of a school across positions is base level beta, plus school beta: \(\beta_0+\beta_{school}\).
Salary of a position across schools is base level beta, plus position beta: \(\beta_0+\beta_{position}\). Salary of a certain position of a certain school is predicted as base level beta, plus school beta, plus position beta, plus the interaction beta: \(\beta_0+\beta_{school}+\beta_{position}+\beta_{school \ \times \ position}\).

To answer each of the following 4 questions do the following:

Create contrast for comparison
Make histogram of the contrast
Calculate mean of the contrast
Calculate 95%-HDI for the contrast. To find HDI use hdi() from library HDInterval

Contrast for comparison of departments

Use contrasts to compare salaries at Business and Finance (“BFIN”) with Physics (“PHYS”) and with Chemistry (“CHEM”) departments.

To do that select columns of MCMC for departments to “BFIN” and “PHYS” and “BFIN” and “CHEM”, take their differences and look at the posterior distribution of the differences

contrast.BFIN.PHYS <- fit_ext.b2[,"BFIN"]-fit_ext.b2[,"PHYS"]
hist(contrast.BFIN.PHYS)

mean(contrast.BFIN.PHYS)

[1] 111653.5

hdi(contrast.BFIN.PHYS)

    lower     upper 
 99528.41 124555.65 
attr(,"credMass")
[1] 0.95

What do we conclude about the differences of salaries?

Plot histograms of salaries for both departments. Salaries BFINComp of Business and Finance and PHYSComp of Physics are calculate as described above.

BFINComp <- fit_ext.b2[,"BFIN"]
PHYSComp <- fit_ext.b2[,"PHYS"]

distrBFIN <- density(BFINComp)
distrPHYS <- density(PHYSComp)

plot(distrPHYS,xlim=c(-20000,200000), lwd=2, main="Salaries Distributions", col="blue")
lines(distrBFIN$x, distrPHYS$y, lwd=2, col="orange")
legend("top", legend = c("PHYS","BFIN"), col = c("blue","orange"), lty=1, lwd=2)

Do the same comparison for finance professors and chemistry professors.

contrast.BFIN.CHEM<-fit_ext.b2[,"BFIN"]-fit_ext.b2[,"CHEM"]
hist(contrast.BFIN.CHEM)

mean(contrast.BFIN.CHEM)

[1] 89933.55

hdi(contrast.BFIN.CHEM)

    lower     upper 
 78655.11 101160.71 
attr(,"credMass")
[1] 0.95

BFINComp <- fit_ext.b2[,"BFIN"]
CHEMComp <- fit_ext.b2[,"CHEM"]

distrBFIN <- density(BFINComp)
distrPHYS <- density(PHYSComp)

plot(distrPHYS,xlim=c(-20000,200000), lwd=2, main="Salaries Distributions", col="blue")
lines(distrBFIN$x, distrPHYS$y, lwd=2, col="orange")
legend("top", legend = c("PHYS","BFIN"), col = c("blue","orange"), lty=1, lwd=2)

Contrast for comparison of positions

Use contrasts to compare salaries of Endowment Full Professor (“NDW”) and Distinguished Full Professor (“DST”). Compare salaries of Full Professor (“FT1”) and Endowment Full Professor

contrast.NDW.DST <- fit_ext.b1[,"NDW"] - fit_ext.b1[,"DST"]
hist(contrast.NDW.DST)

mean(contrast.NDW.DST)

[1] -28579.3

hdi(contrast.NDW.DST)

    lower     upper 
-38342.94 -17938.62 
attr(,"credMass")
[1] 0.95

DSTComp <- fit_ext.b1[,"NDW"]
NDWComp <- fit_ext.b1[,"DST"]

distrDST<-density(DSTComp)
distrNDW<-density(NDWComp)
plot(distrDST,xlim=c(14000,80000),lwd=2,main="Salaries Distributions",col="blue")
lines(distrNDW$x,distrPHYS$y,lwd=2,col="orange")
legend("top",legend=c("DST","NDW"),col=c("blue","orange"),lty=1,lwd=2)

Salary of a Distinguished Full Professor is significantly higher than salary of Endowment Full Professor.

Analyze difference between salaries of Full Professor (“FT1”) and Endowment Full Professor

contrast.NDW.FT1<-fit_ext.b1[,"NDW"]-fit_ext.b1[,"FT1"]
hist(contrast.NDW.FT1)

mean(contrast.NDW.FT1)

[1] 30070.41

hdi(contrast.NDW.FT1)

   lower    upper 
23860.41 35549.30 
attr(,"credMass")
[1] 0.95

Contrast for comparison of spreads

Use contrasts to compare salaries spreads between Full Professor and Assistant Professor at Physics Department and at Chemistry Department.

spreadPhys<-fit_ext.b1[,"FT1"]+fit_ext.b1.b2[,"FT1","PHYS"]-
  (fit_ext.b1[,"FT3"]+fit_ext.b1.b2[,"FT3","PHYS"])
spreadChem<-fit_ext.b1[,"FT1"]+fit_ext.b1.b2[,"FT1","CHEM"]-
  (fit_ext.b1[,"FT3"]+fit_ext.b1.b2[,"FT3","CHEM"])
spread<-spreadPhys-spreadChem
hist(spread)

mean(spread)

[1] -15530.47

hdi(spread)

     lower      upper 
-35565.326   3157.709 
attr(,"credMass")
[1] 0.95

Find the highest HDI level for which the spread of differences between “FT1” and “FT3” is significant.

hdi(spread,.88)

      lower       upper 
-30176.5124   -661.9843 
attr(,"credMass")
[1] 0.88

tryLevels<-seq(from=.8,to=.95,by=.01)
hdis<-sapply(tryLevels,function(z) hdi(spread,z))
maxLevel<-tail(tryLevels[hdis[2,]<0],1)
hdi(spread,maxLevel)

      lower       upper 
-30176.5124   -661.9843 
attr(,"credMass")
[1] 0.88

Understanding the effect of scaling and transformations on interactions

Nonlinear transformations may affect interactions very significantly.

Illustrate it on a simple simulated example.

mean00<-1
mean10<-3
mean01<-4
mean11<-6
y00<-rnorm(5,mean00,.1)
y10<-rnorm(5,mean10,.1)
y01<-rnorm(5,mean01,.1)
y11<-rnorm(5,mean11,.1)

Plot the effects. If the lines are parallel the effects are additive.

plot(c(0,1),c(mean(y00),mean(y10)),type="b",ylim=c(1,8),col="darkgreen",lwd=3,ylab="Response",xlab="Predictor 1")
lines(c(0,1),c(mean(y01),mean(y11)),type="b",col="lightblue",lwd=3)
legend("topleft",legend=c("Predictor2 at 0","Predictor2 at 1"),lty=1,lwd=3,col=c("darkgreen","lightblue"))

Taking exponent of the same data introduces significant interaction.

plot(c(0,1),c(mean(exp(y00)),mean(exp(y10))),type="b",ylim=c(1,400),col="darkgreen",lwd=3,ylab="Response",xlab="Predictor 1")
lines(c(0,1),c(mean(exp(y01)),mean(exp(y11))),type="b",col="lightblue",lwd=3)
legend("topleft",legend=c("Predictor2 at 0","Predictor2 at 1"),lty=1,lwd=3,col=c("darkgreen","lightblue"))

ANCOVA

Traditional analysis

We cannot use aov in ANCOVA because it uses Type I SS, instead we should use Type II SS to get the correct results.

longevity.aov2 <- aov(Longevity ~ CompanionNumber + Thorax, data = dta)
# car Anova command on our longevity.aov2 object,
#summary(longevity.aov2) #produces incorrect results
Anova(longevity.aov2, type="II")

Anova Table (Type II tests)

Response: Longevity
                 Sum Sq  Df F value    Pr(>F)    
CompanionNumber  9611.5   4  21.753 1.719e-13 ***
Thorax          13168.9   1 119.219 < 2.2e-16 ***
Residuals       13144.7 119                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The covariate, Thorax, was significantly related to the fly’s longevity, F(1,119)=119.22, p<.001. There was also a significant effect of the Companion Number on the Longevity after controlling for the effect of the Thorax, F(4,119)=21.753, p<.001.

Bayesian approach

Data show significant variance within groups.

An additional metric predictor that can explain part of that variance.

In this example the subjects of the experiment were in different physical conditions which contributed to large variation of life expectancy.

A metric predictor that is available is the size of the species.

Create data with both predictors.

dataList2<-list(Ntotal=nrow(dta),
                y=dta$Longevity,
                xMet=dta$Thorax,
                xNom=as.integer(as.factor(dta$CompanionNumber)),
                NxLvl=nlevels(as.factor(dta$CompanionNumber)),
                agammaShRa=unlist(gammaShRaFromModeSD(mode=sd(dta$Longevity)/2, 
                                                      sd=2*sd(dta$Longevity))))

modelString<-"
data {
    int<lower=1> Ntotal;
    real y[Ntotal];
    int<lower=2> NxLvl;
    int<lower=1, upper=NxLvl> xNom[Ntotal];
    real xMet[Ntotal];
    real<lower=0> agammaShRa[2];
}
transformed data {
    real meanY;
    real sdY;
    real xMetMean;
    real xMetSD;
    meanY = mean(y);
    sdY = sd(y);
    xMetMean = mean(xMet);
    xMetSD = sd(xMet);
}
parameters {
    real a0;
    real<lower=0> aSigma;
    vector[NxLvl] a;
    real aMet;
    real<lower=0> ySigma;
}
model {
    a0 ~ normal(meanY, 5*sdY);
    aSigma ~ gamma(agammaShRa[1], agammaShRa[2]);
    a ~ normal(0, aSigma);
    aMet ~ normal(0, 2*sdY/xMetSD);
    ySigma ~ uniform(sdY/100, sdY*10);
    for ( i in 1:Ntotal ) {
        y[i] ~ normal(a0 + a[xNom[i]] + aMet*(xMet[i] - xMetMean), ySigma);
    }
}
generated quantities {
    // Convert a0,a[] to sum-to-zero b0,b[] :
        real b0;
    vector[NxLvl] b;
    b0 = a0 + mean(a) - aMet * xMetMean;
    b = a - mean(a);
}
"

model2 <- stan_model(model_code=modelString)

If saved DSO is used load it, then run the chains.

# saveRDS(model2, file="data/stanDSO2.Rds")
model2 <- readRDS("data/stanDSO2.Rds")

Run MCMC.

fit2 <- sampling(model2, 
                data=dataList2, 
                pars=c('b0', 'b', 'aMet', 'aSigma', 'ySigma'),
                iter=5000, chains = 2, cores = 4)

launch_shinystan(fit2)

Calculate same contrasts for betas:

Contrasts between group 1 (‘None0’) and 5 (‘Virgin8’).

fit_ext2 <- rstan::extract(fit2)
head(fit_ext2$b)

          
iterations     [,1]     [,2]     [,3]       [,4]      [,5]
      [1,] 4.466494 6.207247 3.940179 -0.8029348 -13.81098
      [2,] 7.582592 4.825399 6.512522 -5.6358837 -13.28463
      [3,] 6.948511 7.865498 6.695544 -6.7136045 -14.79595
      [4,] 4.922978 7.802200 6.146581 -5.6397349 -13.23202
      [5,] 7.228799 5.271264 6.506355 -5.4617319 -13.54469
      [6,] 3.844551 8.005606 6.024163 -5.0992662 -12.77505

contrast2_1_5 <- fit_ext2$b[,1] - fit_ext2$b[,5]
plot(contrast2_1_5)

hist(contrast2_1_5)

(hdiContrast2_1_5<-hdi(contrast2_1_5))

   lower    upper 
13.40625 25.09579 
attr(,"credMass")
[1] 0.95

(sd.contrast2_1_5<-sd(contrast2_1_5))

[1] 2.995819

plot(rank(fit_ext2$b[,1]),rank(fit_ext2$b[,5]))

Contrast between average of groups 1-3 and average of groups 4-5

Comb1<-(fit_ext2$b[,1] + fit_ext2$b[,2] + fit_ext2$b[,3])/3
Comb2<-(fit_ext2$b[,4] + fit_ext2$b[,5])/2
contrast2_123_45 <-Comb1-Comb2
plot(contrast2_123_45)

hist(contrast2_123_45)

(hdiContrast2_123_45<-hdi(contrast2_123_45))

   lower    upper 
11.28173 18.74789 
attr(,"credMass")
[1] 0.95

(sd.contrast2_123_45<-sd(contrast2_123_45))

[1] 1.910106

plot(rank(Comb1),rank(Comb2))

Compare the models.

Using standard deviation:

rbind(WithoutMetricPred=c(sd.contrast_1_5,sd.contrast_123_45),WithMetricPred=c(sd.contrast2_1_5,sd.contrast2_123_45))

                      [,1]     [,2]
WithoutMetricPred 4.256989 2.734301
WithMetricPred    2.995819 1.910106

Using HDI:

rbind(WithoutMetricPred=c(hdiContrast_1_5,hdiContrast_123_45),WithMetricPred=c(hdiContrast2_1_5,hdiContrast2_123_45))

                     lower    upper     lower    upper
WithoutMetricPred 15.21982 31.95605  9.713508 20.41661
WithMetricPred    13.40625 25.09579 11.281726 18.74789

Using HDI width:

rbind(WithoutMetricPred=c(Contrast_1_5=unname(diff(hdiContrast_1_5)),
                          Contrast_123_45=unname(diff(hdiContrast_123_45))),
      WithMetricPred=c(Contrast_1_5=diff(hdiContrast2_1_5),
                       Contrast_123_45=diff(hdiContrast2_123_45)))

                  Contrast_1_5 Contrast_123_45
WithoutMetricPred     16.73622       10.703103
WithMetricPred        11.68954        7.466168

Obviously, the width was shrinkage so with the metric predictor in the model, we can explain more the variability of Longevity due to the difference among groups and Thorax.

Heterogeneous variances

The most restrictive assumption of ANOVA is equivalence of variances in all groups.

Look how Bayesian approach can remove this assumption.

On the following model diagram there is an implementation of model with heterogeneous variances.

The model uses t-distributed noise and allows each group to have its own standard deviation.

Data set InsectSprays from datasets shows test of 6 different insect sprays. Column count contains numbers of insects found in the field after each spraying. Column spray identifies the type of spray.

Prepare the data.

df <- InsectSprays
head(df)

  count spray
1    10     A
2     7     A
3    20     A
4    14     A
5    14     A
6    12     A

levels(df$spray)

[1] "A" "B" "C" "D" "E" "F"

dataListSprays <- list(Ntotal = nrow(df),
                       y = df$count,
                       x = as.integer(df$spray),
                       NxLvl = nlevels(df$spray),
                       aGammaShRa = unlist(gammaShRaFromModeSD(mode=sd(df$count)/2, 
                                                               sd=2*sd(df$count))))

Plot the data.

plot(df$count ~ df$spray)

Note that variances within the groups are very different. If estimated overall the variance for groups “C”, “D” and “E” is overestimated and for “A” “B” and “F” - underestimated.

Apply traditional ANOVA method.

m1 <- lm(count~spray, df)
summary(m1)


Call:
lm(formula = count ~ spray, data = df)

Residuals:
   Min     1Q Median     3Q    Max 
-8.333 -1.958 -0.500  1.667  9.333 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  14.5000     1.1322  12.807  < 2e-16 ***
sprayB        0.8333     1.6011   0.520    0.604    
sprayC      -12.4167     1.6011  -7.755 7.27e-11 ***
sprayD       -9.5833     1.6011  -5.985 9.82e-08 ***
sprayE      -11.0000     1.6011  -6.870 2.75e-09 ***
sprayF        2.1667     1.6011   1.353    0.181    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.922 on 66 degrees of freedom
Multiple R-squared:  0.7244,    Adjusted R-squared:  0.7036 
F-statistic:  34.7 on 5 and 66 DF,  p-value: < 2.2e-16

anova(m1)

Analysis of Variance Table

Response: count
          Df Sum Sq Mean Sq F value    Pr(>F)    
spray      5 2668.8  533.77  34.702 < 2.2e-16 ***
Residuals 66 1015.2   15.38                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#m.aov <- aov(count~spray, df)
#summary(m.aov)

The categorical factor is significant, i.e. utility test fails.

Create contrasts to identify differences between combinations of sprays.

A <- factor(df$spray)
contrasts(A)

  B C D E F
A 0 0 0 0 0
B 1 0 0 0 0
C 0 1 0 0 0
D 0 0 1 0 0
E 0 0 0 1 0
F 0 0 0 0 1

contrasts(A)<-cbind("GA vs GB"=c(1,-1,0,0,0,0),
                    "GAB vs GF"=c(1,1,0,0,0,-2),
                    "GC vs GD"=c(0,0,1,-1,0,0),
                    "GC vs GE"=c(0,0,1,0,-1,0),
                    "GABF vs GCDE"=c(1,1,-1,-1,-1,1))
A

 [1] A A A A A A A A A A A A B B B B B B B B B B B B C C C C C C C C C
[34] C C C D D D D D D D D D D D D E E E E E E E E E E E E F F F F F F
[67] F F F F F F
attr(,"contrasts")
  GA vs GB GAB vs GF GC vs GD GC vs GE GABF vs GCDE
A        1         1        0        0            1
B       -1         1        0        0            1
C        0         0        1        1           -1
D        0         0       -1        0           -1
E        0         0        0       -1           -1
F        0        -2        0        0            1
Levels: A B C D E F

aov(df$count ~ A)

Call:
   aov(formula = df$count ~ A)

Terms:
                       A Residuals
Sum of Squares  2668.833  1015.167
Deg. of Freedom        5        66

Residual standard error: 3.921902
Estimated effects may be unbalanced

summary.lm(aov(df$count ~ A))


Call:
aov(formula = df$count ~ A)

Residuals:
   Min     1Q Median     3Q    Max 
-8.333 -1.958 -0.500  1.667  9.333 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    9.500e+00  4.622e-01  20.554   <2e-16 ***
AGA vs GB     -4.167e-01  8.006e-01  -0.520    0.604    
AGAB vs GF    -5.833e-01  4.622e-01  -1.262    0.211    
AGC vs GD     -1.417e+00  9.244e-01  -1.533    0.130    
AGC vs GE      8.374e-16  9.244e-01   0.000    1.000    
AGABF vs GCDE  6.000e+00  4.622e-01  12.981   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.922 on 66 degrees of freedom
Multiple R-squared:  0.7244,    Adjusted R-squared:  0.7036 
F-statistic:  34.7 on 5 and 66 DF,  p-value: < 2.2e-16

Summary shows that sprays A and B, C and D, C and E are indistinguishable as well as combined observations of sprays A and B vs. F, but combined observations for sprays A, B, F are significantly different from C, D and E.

However, separate t-test of C vs. D shows significant difference and equivalence of C and E is almost rejected by t-test.

t.test(df$count[df$spray=="C"], df$count[df$spray=="D"])


    Welch Two Sample t-test

data:  df$count[df$spray == "C"] and df$count[df$spray == "D"]
t = -3.0782, df = 20.872, p-value = 0.00573
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -4.7482230 -0.9184437
sample estimates:
mean of x mean of y 
 2.083333  4.916667

t.test(df$count[df$spray=="C"], df$count[df$spray=="E"])


    Welch Two Sample t-test

data:  df$count[df$spray == "C"] and df$count[df$spray == "E"]
t = -1.868, df = 21.631, p-value = 0.07536
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.9909888  0.1576555
sample estimates:
mean of x mean of y 
 2.083333  3.500000

These contradicting results are caused by the fact that for comparison of C with D and E the assumption of homoscedasticity leads to overestimation of noise in general framework of ANOVA, but pairwise t-test is only accounting for variances of the compared pairs.

Bayes’

Run MCMC.

modelString<-"
data {
    int<lower=1> Ntotal;
    real y[Ntotal];
    int<lower=2> NxLvl;
    int<lower=1, upper=NxLvl> x[Ntotal];
    real<lower=0> aGammaShRa[2];
}
transformed data {
    real meanY;
    real sdY;
    meanY = mean(y);
    sdY = sd(y);
}
parameters {
    real<lower=0> nu;
    real a0;
    real<lower=0> aSigma;
    vector[NxLvl] a;
    real<lower=0> ySigma[NxLvl];
    real<lower=0> ySigmaMode;
    real<lower=0> ySigmaSD;
}
transformed parameters{
    real<lower=0> ySigmaSh;
    real<lower=0> ySigmaRa;
    ySigmaRa = ( ( ySigmaMode + sqrt( ySigmaMode^2 + 4*ySigmaSD^2 ) ) / ( 2*ySigmaSD^2 ) );
    ySigmaSh = 1 + ySigmaMode * ySigmaRa;
}
model {
    nu ~ exponential(1/30.0);
    a0 ~ normal(meanY, 10*sdY);
    aSigma ~ gamma(aGammaShRa[1], aGammaShRa[2]);
    a ~ normal(0, aSigma);
    ySigma ~ gamma(ySigmaSh, ySigmaRa);
    ySigmaMode ~ gamma(aGammaShRa[1], aGammaShRa[2]);
    ySigmaSD ~ gamma(aGammaShRa[1], aGammaShRa[2]);
    for ( i in 1:Ntotal ) {
        y[i] ~ student_t(nu, a0 + a[x[i]], ySigma[x[i]]);
    }
}
generated quantities {
    // Convert a0,a[] to sum-to-zero b0,b[] :
        real b0;
    vector[NxLvl] b;
    b0 = a0 + mean(a);
    b = a - mean(a);
}
"

model3 <- stan_model(model_code = modelString)

# saveRDS(model3,file = "data/stanDSO3.Rds")
model3 <- readRDS(file = "data/stanDSO3.Rds")

Run chains.

fit.sprays <- sampling(model3, 
                data=dataListSprays, 
                pars=c('b0', 'b', 'aSigma', 'ySigma', 'nu', 'ySigmaMode', 'ySigmaSD'),
                iter=5000, chains = 2, cores = 4)

launch_shinystan(fit)

Analyze estimates.

plot(fit.sprays,pars=c("b0","aSigma"))

plot(fit.sprays,pars=c("b"))

plot(fit.sprays,pars=c("ySigma"))

Extract chains.

chains_sprays <- rstan::extract(fit.sprays)
head(chains_sprays$b)

          
iterations     [,1]     [,2]      [,3]      [,4]      [,5]     [,6]
      [1,] 5.321336 5.903195 -7.231029 -5.126114 -5.243187 6.375798
      [2,] 6.009771 4.809611 -7.284751 -4.387986 -6.658077 7.511432
      [3,] 3.685636 5.496733 -6.662345 -3.278268 -5.640332 6.398575
      [4,] 3.609592 4.813631 -6.971538 -2.848964 -6.461671 7.858950
      [5,] 7.089465 4.230392 -7.824936 -5.013679 -6.613809 8.132567
      [6,] 4.836717 6.068238 -6.645967 -4.104199 -5.881397 5.726608

Look at the same contrasts.

A vs. B

contrast1_2 <- chains_sprays$b[,1] - chains_sprays$b[,2]
plot(contrast1_2)

hist(contrast1_2)

(hdiContrast1_2<-hdi(contrast1_2))

    lower     upper 
-4.566850  3.072822 
attr(,"credMass")
[1] 0.95

(sd.contrast1_2<-sd(contrast1_2))

[1] 1.92444

plot(rank(chains_sprays$b[,1]),rank(chains_sprays$b[,5]))

The contrast is not different from zero.

A, B vs. F

Comb1<-chains_sprays$b[,1] + chains_sprays$b[,2]
Comb2<-2*chains_sprays$b[,6]
contrast12_6 <-Comb1  - Comb2
plot(contrast12_6)

hist(contrast12_6)

(hdiContrast12_6<-hdi(contrast12_6))

    lower     upper 
-11.15426   5.21297 
attr(,"credMass")
[1] 0.95

(sd.contrast12_6<-sd(contrast12_6))

[1] 4.148463

plot(rank(Comb1),rank(Comb2))

The contrast is not different from zero.

C vs. D

contrast3_4 <- chains_sprays$b[,3] - chains_sprays$b[,4]
plot(contrast3_4)

hist(contrast3_4)

(hdiContrast3_4<-hdi(contrast3_4))

     lower      upper 
-4.8309758 -0.7441633 
attr(,"credMass")
[1] 0.95

(sd.contrast3_4<-sd(contrast3_4))

[1] 1.01399

plot(rank(chains_sprays$b[,3]),rank(chains_sprays$b[,4]))

This contrast is different from zero. ANOVA could not detect that.

C vs. E

contrast3_5 <- chains_sprays$b[,3] - chains_sprays$b[,5]
plot(contrast3_5)

hist(contrast3_5)

(hdiContrast3_5<-hdi(contrast3_5))

    lower     upper 
-3.206783  0.386665 
attr(,"credMass")
[1] 0.95

(sd.contrast3_5<-sd(contrast3_5))

[1] 0.9066876

plot(rank(chains_sprays$b[,3]),rank(chains_sprays$b[,5]))

The contrast is not different from zero.

Combined observations A, B and F vs. C, D, E

Comb1<-chains_sprays$b[,1] + chains_sprays$b[,2] + chains_sprays$b[,6]
Comb2<-chains_sprays$b[,3] + chains_sprays$b[,4] + chains_sprays$b[,5]
contrast126_345 <-Comb1  - Comb2
plot(contrast126_345)

hist(contrast126_345)

(hdiContrast126_345<-hdi(contrast126_345))

   lower    upper 
29.35368 40.63068 
attr(,"credMass")
[1] 0.95

(sd.contrast126_345<-sd(contrast126_345))

[1] 2.911281

plot(rank(Comb1),rank(Comb2))

The contrast is significantly different from zero.

Series 6 – ANOVA & ANCOVA

One-way ANOVA

Sex and Death example of section 19.3 in (Kruschke 2015)

Traditional analysis

Bayesian approach

Two-way ANOVA

Model description

Data of section 20.1 in (Kruschke 2015)

MCMC

Working with chains and contrasts

Contrast for comparison of departments

Contrast for comparison of positions

Contrast for comparison of spreads

Understanding the effect of scaling and transformations on interactions

ANCOVA

Traditional analysis

Bayesian approach

Heterogeneous variances

Apply traditional ANOVA method.

Bayes’

Further Readings

References

Corrections

Reuse

Citation